image.png

1. Nobel Prize winners! - Importing the data¶

The Nobel Prize is perhaps the world's most well known scientific award. Except for the honor, prestige and substantial prize money the recipient also gets a gold medal showing Alfred Nobel (1833 - 1896) who established the prize. Every year it's given to scientists and scholars in the categories chemistry, literature, physics, physiology or medicine, economics, and peace. The first Nobel Prize was handed out in 1901, and at that time the Prize was very Eurocentric and male-focused, but nowadays it's not biased in any way whatsoever. Surely.

Well, I'm going to find out! The Nobel Foundation has made a dataset available of all prize winners from the start of the prize, in 1901, to 2016 which I have downloaded.

In [34]:
import pandas as pd
import seaborn as sns 
import numpy as np
import matplotlib.pyplot as plt
import os
os.system('jupyter nbconvert --to html yourNotebook.ipynb')

# Reading in the Nobel Prize data
data = pd.read_csv(r'C:\Users\Adam\Desktop\Personal\Python Projects\Github\3) Nobel-Prizes\Nobel Prize Data.csv')

data.head()
Out[34]:
Year Category Prize Motivation Prize Share Laureate ID Laureate Type Full Name Birth Date Birth City Birth Country Sex Organization Name Organization City Organization Country Death Date Death City Death Country
0 1901 Chemistry The Nobel Prize in Chemistry 1901 "in recognition of the extraordinary services ... 1/1 160 Individual Jacobus Henricus van 't Hoff 1852-08-30 Rotterdam Netherlands Male Berlin University Berlin Germany 1911-03-01 Berlin Germany
1 1901 Literature The Nobel Prize in Literature 1901 "in special recognition of his poetic composit... 1/1 569 Individual Sully Prudhomme 1839-03-16 Paris France Male NaN NaN NaN 1907-09-07 Châtenay France
2 1901 Medicine The Nobel Prize in Physiology or Medicine 1901 "for his work on serum therapy, especially its... 1/1 293 Individual Emil Adolf von Behring 1854-03-15 Hansdorf (Lawice) Prussia (Poland) Male Marburg University Marburg Germany 1917-03-31 Marburg Germany
3 1901 Peace The Nobel Peace Prize 1901 NaN 1/2 462 Individual Jean Henry Dunant 1828-05-08 Geneva Switzerland Male NaN NaN NaN 1910-10-30 Heiden Switzerland
4 1901 Peace The Nobel Peace Prize 1901 NaN 1/2 463 Individual Frédéric Passy 1822-05-20 Paris France Male NaN NaN NaN 1912-06-12 Paris France
In [ ]:
# 2. So, who gets the Nobel Prize?

Just looking at the first couple of Nobel laureates, Wilhelm Conrad Röntgen can already be seen, the guy who discovered X-rays. And actually, it can be seen that all of the winners in 1901 were guys that came from Europe. But that was back in 1901, looking at all winners in the dataset, from 1901 to 2016, which sex and which country is the most commonly represented?

(For country, I will use the birth_country of the winner, as the organization_country is NaN for all shared Nobel Prizes.)
In [29]:
# Number of (possibly shared) Nobel Prizes handed
# out between 1901 and 2016

print('number of prizes: ' + str(len(data)))

#Number of prizes won by male and female recipients.

print(data.groupby('Sex').Year.count())

#Number of prizes won by the top 10 nationalities.

data['Birth Country'].value_counts().head(10)
number of prizes: 969
Sex
Female     50
Male      893
Name: Year, dtype: int64
Out[29]:
United States of America    276
United Kingdom               88
Germany                      70
France                       53
Sweden                       30
Japan                        29
Russia                       20
Netherlands                  19
Italy                        18
Canada                       18
Name: Birth Country, dtype: int64

3. USA dominance¶

Not so surprising perhaps: the most common Nobel laureate between 1901 and 2016 was a man born in the United States of America. But in 1901 all the winners were European. When did the USA start to dominate the Nobel Prize charts?

In [38]:
# Calculating the proportion of USA born winners per decade
data['usa_born_winner'] = data['Birth Country'] == 'United States of America'
data['decade'] = (np.floor(data.Year/10)*10).astype(int)

# The proportions of USA born winners per decade, keeping th ecolumn names for plotting

prop_usa_winners = data.groupby('decade',as_index=False)['usa_born_winner'].mean()


# Setting the plotting theme & size of plots
sns.set()
plt.rcParams['figure.figsize'] = [11, 7]

# Plotting USA born winners 
ax =  sns.lineplot(x='decade', y='usa_born_winner' , data=prop_usa_winners)

# Adding %-formatting to the y-axis
from matplotlib.ticker import PercentFormatter

ax.yaxis.set_major_formatter(PercentFormatter(1.0))

4. What is the gender of a typical Nobel Prize winner?¶

So the USA became the dominating winner of the Nobel Prize first in the 1930s and had kept the leading position ever since. But one group that was in the lead from the start, and never seems to let go, are men. Maybe it shouldn't come as a shock that there is some imbalance between how many male and female prize winners there are, but how significant is this imbalance? And is it better or worse within specific prize categories like physics, medicine, literature, etc.?

In [46]:
# Calculating the proportion of female laureates per decade
data['female_winner'] = data['Sex'] == 'Female'
prop_female_winners = data.groupby(['decade','Category'],as_index=False)['female_winner'].mean()

# Plotting USA born winners with % winners on the y-axis

ax =   sns.barplot(x='decade', y='female_winner' , data=prop_female_winners, hue='Category')

ax.yaxis.set_major_formatter(PercentFormatter(1.0))

5. The first woman to win the Nobel Prize¶

The plot above is a bit messy as the lines are overplotting. But it does show some interesting trends and patterns. Overall the imbalance is pretty large with physics, economics, and chemistry having the largest imbalance. Medicine has a somewhat positive trend, and since the 1990s the literature prize is also now more balanced. The big outlier is the peace prize during the 2010s, but keep in mind that this just covers the years 2010 to 2016.

Given this imbalance, who was the first woman to receive a Nobel Prize? And in what category?

In [65]:
Female_winners = data[data.Sex =="Female"]

First_female = Female_winners.nsmallest(1,'Year').reset_index()

print('First female winner: ' + str(First_female['Full Name'][0]) + ' Category: ' + str(First_female['Category'][0]))
First female winner: Marie Curie, née Sklodowska Category: Physics

6. How old are you when you get the prize?¶

In [69]:
# Converting birth_date from String to datetime

data['Birth Date']= pd.to_datetime(data['Birth Date'].astype(str), format='%Y-%m-%d', errors = 'coerce')

# Calculating the age of Nobel Prize winners
data['Age'] = data['Year']  -  data['Birth Date'].dt.year

# Plotting the age of Nobel Prize winners
sns.lmplot(x ='Year' , y='Age', data=data , lowess=True, aspect=2, line_kws={'color' : 'black'} )


# Plotting the age of Nobel Prize winners
sns.lmplot(x ='Year' , y='Age', data=data , row='Category', lowess=True, aspect=2, line_kws={'color' : 'black'} )
Out[69]:
<seaborn.axisgrid.FacetGrid at 0x24d9c4d1208>